Initial steps condensed for today

In the interest of time we are going to vastly condense the first two steps. I have done these steps on your instances already. Additionally, we’ll get the third one running ahead of before we’ll really need the results

So you now should have

genome.fa genome.fa.fai SRR346368.fastq

in your /usr/workshop directory.

You can just enter ls to see this if you are already in the /usr/workshop directory.

If you are unsure what is your current working directory, type pwd. If in doubt about any of that, just enter the commands below in your terminal

cd usr/workshop
ls
  • The genome.fa represents getting the genome data for the organism with which we’ll be working.
  • The genome.fa.fai file has summarized information about the lengths of each chromosome that we’ll use for today’s work. The details of obtaining these files is in the early part of the pipeline.
  • The SRR346368.fastq is the experimental data. Because it is important to know what it is and where it came from for today’s work, we’ll look at that some.

Before that though, in the interest of time we are going to initiate a rate-limiting step first and then build up to you understanding what you did after.

Please enter in your terminal

git clone https://github.com/fomightez/sequencework.git
cp sequencework/SOLiD/split_SOLiD_reads.py .
python split_SOLiD_reads.py SRR346368.fastq TAGCGT 10

Now we’ll talk about the two big parts of steps #1 and #2 we skipped. That should take us to the point of the commands we just ran and then we’ll actually continue with the workshop steps as documented, picking up with the Mapping reads section.

Note that the instructions for steps #1 and #2 start off assuming you have a clean, new system (the set up of which is described here and the pages after that ). This obviously isn’t the case today and so we will not run those commands as we talk about those steps.